Goto

Collaborating Authors

 Greater Upper Nile



ChemBOMAS: Accelerated BO in Chemistry with LLM-Enhanced Multi-Agent System

Han, Dong, Ai, Zhehong, Cai, Pengxiang, Lu, Shanya, Chen, Jianpeng, Ye, Zihao, Sun, Shuzhou, Gao, Ben, Ge, Lingli, Wang, Weida, Zhou, Xiangxin, Liu, Xihui, Su, Mao, Ouyang, Wanli, Bai, Lei, Zhou, Dongzhan, Xu, Tao, Li, Yuqiang, Zhang, Shufei

arXiv.org Artificial Intelligence

Bayesian optimization (BO) is a powerful tool for scientific discovery in chemistry, yet its efficiency is often hampered by the sparse experimental data and vast search space. Here, we introduce ChemBOMAS: a large language model (LLM)-enhanced multi-agent system that accelerates BO through synergistic data- and knowledge-driven strategies. Firstly, the data-driven strategy involves an 8B-scale LLM regressor fine-tuned on a mere 1% labeled samples for pseudo-data generation, robustly initializing the optimization process. Secondly, the knowledge-driven strategy employs a hybrid Retrieval-Augmented Generation approach to guide LLM in dividing the search space while mitigating LLM hallucinations. An Upper Confidence Bound algorithm then identifies high-potential subspaces within this established partition. Across the LLM-refined subspaces and supported by LLM-generated data, BO achieves the improvement of effectiveness and efficiency. Comprehensive evaluations across multiple scientific benchmarks demonstrate that ChemBOMAS set a new state-of-the-art, accelerating optimization efficiency by up to 5-fold compared to baseline methods.



Podcasts as a Medium for Participation in Collective Action: A Case Study of Black Lives Matter

Moldovan, Theodora, Pera, Arianna, Vega, Davide, Aiello, Luca Maria

arXiv.org Artificial Intelligence

We study how participation in collective action is articulated in podcast discussions, using the Black Lives Matter (BLM) movement as a case study. While research on collective action discourse has primarily focused on text-based content, this study takes a first step toward analyzing audio formats by using podcast transcripts. Using the Structured Podcast Research Corpus (SPoRC), we investigated spoken language expressions of participation in collective action, categorized as problem-solution, call-to-action, intention, and execution. We identified podcast episodes discussing racial justice after important BLM-related events in May and June of 2020, and extracted participatory statements using a layered framework adapted from prior work on social media. We examined the emotional dimensions of these statements, detecting eight key emotions and their association with varying stages of activism. We found that emotional profiles vary by stage, with different positive emotions standing out during calls-to-action, intention, and execution. We detected negative associations between collective action and negative emotions, contrary to theoretical expectations. Our work contributes to a better understanding of how activism is expressed in spoken digital discourse and how emotional framing may depend on the format of the discussion.


Retrieval-Augmented Clinical Benchmarking for Contextual Model Testing in Kenyan Primary Care: A Methodology Paper

Mutisya, Fred, Gitau, Shikoh, Syovata, Christine, Oigara, Diana, Matende, Ibrahim, Aden, Muna, Ali, Munira, Nyotu, Ryan, Marion, Diana, Nyangena, Job, Ongoma, Nasubo, Mbae, Keith, Wamicha, Elizabeth, Mibuari, Eric, Nsengemana, Jean Philbert, Chidede, Talkmore

arXiv.org Artificial Intelligence

Large Language Models (LLMs) hold promise for improving healthcare access in low-resource settings, but their effectiveness in African primary care contexts remains under-explored. We present a rigorous methodology for creating a benchmark dataset and evaluation framework focused on Kenyan Level 2-3 (dispensary and health center) clinical care. Our approach leverages retrieval-augmented generation (RAG) to ground questions and answers in Kenya's national clinical guidelines, ensuring content aligns with local standard-of-care. The guidelines were digitised, chunked, and indexed for efficient semantic retrieval. Gemini Flash 2.0 Lite was then prompted with relevant guideline excerpts to generate realistic clinical questions, multiple - choice answers, and reasoning scenarios with source citations in English and Swahili. We engaged Kenyan physicians in a co - creation process to refine the dataset's relevance and fairness, and instituted a blinded expert validation pipeline to review for clinical accuracy, clarity, and cultural appropriateness. The resulting Alama Health QA dataset comprises thousands of regulator-aligned question-answer pairs spanning common outpatient conditions in English and Swahili. Beyond standard accuracy metrics, we propose innovative evaluation measures targeting clinical reasoning, safety, and adaptability (e.g. Initial results highlight significant performance gaps in state - of-the - art LLMs when confronted with localized scenarios, echoing recent findings that LLM accuracy on African medical questions lags behind performance on U.S. benchmarks. Our work demonstrates a pathway for dynamic, locally-grounded benchmarks that can evolve with guidelines, providing a crucial tool for safe and effective deployment of AI in African healthcare. Advances in large language models have spurred interest in their potential to augment medical services, especially in low-and middle -income countries facing clinician shortages(Bekbolatova et al., 2024). By handling routine queries or providing decision support, LLMs might help bridge gaps in healthcare access across Africa.


Enhancements for Developing a Comprehensive AI Fairness Assessment Standard

Agarwal, Avinash, Kumar, Mayashankar, Nene, Manisha J.

arXiv.org Artificial Intelligence

Abstract--As AI systems increasingly influence critical sectors like telecommunications, finance, healthcare, and pub lic services, ensuring fairness in decision-making is essenti al to prevent biased or unjust outcomes that disproportionately affect vulnerable entities or result in adverse impacts. This need is particularly pressing as the industry approaches the 6G era, where AI will drive complex functions like autonomous netwo rk management and hyper-personalized services. However, as AI applications diversify, this standard requires enhanceme nt to strengthen its impact and broaden its applicability. This p aper proposes an expansion of the TEC Standard to include fairnes s assessments for images, unstructured text, and generative AI, including large language models, ensuring a more comprehen - sive approach that keeps pace with evolving AI technologies . By incorporating these dimensions, the enhanced framework will promote responsible and trustworthy AI deployment acr oss various sectors. The widespread adoption of Artificial Intelligence (AI) and Machine Learning (ML) technologies has driven transforma-tive advancements across critical sectors, including tele communications, healthcare, finance, and public services.


Navigating Semantic Relations: Challenges for Language Models in Abstract Common-Sense Reasoning

Gawin, Cole, Sun, Yidan, Kejriwal, Mayank

arXiv.org Artificial Intelligence

Large language models (LLMs) have achieved remarkable performance in generating human-like text and solving reasoning tasks of moderate complexity, such as question-answering and mathematical problem-solving. However, their capabilities in tasks requiring deeper cognitive skills, such as common-sense understanding and abstract reasoning, remain under-explored. In this paper, we systematically evaluate abstract common-sense reasoning in LLMs using the ConceptNet knowledge graph. We propose two prompting approaches: instruct prompting, where models predict plausible semantic relationships based on provided definitions, and few-shot prompting, where models identify relations using examples as guidance. Our experiments with the gpt-4o-mini model show that in instruct prompting, consistent performance is obtained when ranking multiple relations but with substantial decline when the model is restricted to predicting only one relation. In few-shot prompting, the model's accuracy improves significantly when selecting from five relations rather than the full set, although with notable bias toward certain relations. These results suggest significant gaps still, even in commercially used LLMs' abstract common-sense reasoning abilities, compared to human-level understanding. However, the findings also highlight the promise of careful prompt engineering, based on selective retrieval, for obtaining better performance.


Can you pass that tool?: Implications of Indirect Speech in Physical Human-Robot Collaboration

Zhang, Yan, Ratnayake, Tharaka Sachintha, Sew, Cherie, Knibbe, Jarrod, Goncalves, Jorge, Johal, Wafa

arXiv.org Artificial Intelligence

Indirect speech acts (ISAs) are a natural pragmatic feature of human communication, allowing requests to be conveyed implicitly while maintaining subtlety and flexibility. Although advancements in speech recognition have enabled natural language interactions with robots through direct, explicit commands--providing clarity in communication--the rise of large language models presents the potential for robots to interpret ISAs. However, empirical evidence on the effects of ISAs on human-robot collaboration (HRC) remains limited. To address this, we conducted a Wizard-of-Oz study (N=36), engaging a participant and a robot in collaborative physical tasks. Our findings indicate that robots capable of understanding ISAs significantly improve human's perceived robot anthropomorphism, team performance, and trust. However, the effectiveness of ISAs is task- and context-dependent, thus requiring careful use. These results highlight the importance of appropriately integrating direct and indirect requests in HRC to enhance collaborative experiences and task performance.


Benchmarking Zero-Shot Facial Emotion Annotation with Large Language Models: A Multi-Class and Multi-Frame Approach in DailyLife

Zhang, He, Fu, Xinyi

arXiv.org Artificial Intelligence

This study investigates the feasibility and performance of using large language models (LLMs) to automatically annotate human emotions in everyday scenarios. We conducted experiments on the DailyLife subset of the publicly available FERV39k dataset, employing the GPT-4o-mini model for rapid, zero-shot labeling of key frames extracted from video segments. Under a seven-class emotion taxonomy ("Angry," "Disgust," "Fear," "Happy," "Neutral," "Sad," "Surprise"), the LLM achieved an average precision of approximately 50%. In contrast, when limited to ternary emotion classification (negative/neutral/positive), the average precision increased to approximately 64%. Additionally, we explored a strategy that integrates multiple frames within 1-2 second video clips to enhance labeling performance and reduce costs. The results indicate that this approach can slightly improve annotation accuracy. Overall, our preliminary findings highlight the potential application of zero-shot LLMs in human facial emotion annotation tasks, offering new avenues for reducing labeling costs and broadening the applicability of LLMs in complex multimodal environments.


Integrating Reinforcement Learning and AI Agents for Adaptive Robotic Interaction and Assistance in Dementia Care

Yuan, Fengpei, Hasnaeen, Nehal, Zhang, Ran, Bible, Bryce, Taylor, Joseph Riley, Qi, Hairong, Yao, Fenghui, Zhao, Xiaopeng

arXiv.org Artificial Intelligence

This study explores a novel approach to advancing dementia care by integrating socially assistive robotics, reinforcement learning (RL), large language models (LLMs), and clinical domain expertise within a simulated environment. This integration addresses the critical challenge of limited experimental data in socially assistive robotics for dementia care, providing a dynamic simulation environment that realistically models interactions between persons living with dementia (PLWDs) and robotic caregivers. The proposed framework introduces a probabilistic model to represent the cognitive and emotional states of PLWDs, combined with an LLM-based behavior simulation to emulate their responses. We further develop and train an adaptive RL system enabling humanoid robots, such as Pepper, to deliver context-aware and personalized interactions and assistance based on PLWDs' cognitive and emotional states. The framework also generalizes to computer-based agents, highlighting its versatility. Results demonstrate that the RL system, enhanced by LLMs, effectively interprets and responds to the complex needs of PLWDs, providing tailored caregiving strategies. This research contributes to human-computer and human-robot interaction by offering a customizable AI-driven caregiving platform, advancing understanding of dementia-related challenges, and fostering collaborative innovation in assistive technologies. The proposed approach has the potential to enhance the independence and quality of life for PLWDs while alleviating caregiver burden, underscoring the transformative role of interaction-focused AI systems in dementia care.